DNA sequences can be entered and edited using either a line editor (one line at a time) or the more familiar “cut and paste” editor. The line editor (ENTER/LINE EDIT DNA SEQUENCE from FILE menu), allows you to concentrate on a small region of the sequence to edit. It also allows you to redefine the keyboard for easier entry of nucleotides. The cut and paste editor (EDIT DNA SEQUENCE from FILE menu) shows you about 1000 nts at a time and will allow you to copy sequences from a “donor” DNA to be pasted into the current DNA you are editing. You can go from one editor to the other by menu selections. Sequences can be confirmed (CONFIRM DNA SEQUENCE from the FILE menu) by re-typing the sequence, or by having the Mac speak the sequence to you. The confirm mode is also accessible from menus in either of the editors.
DNA sequences can be loaded into the DNA Inspector to be analyzed by selecting LOAD SEQUENCE FILE from the FILE menu. The standard selection box will appear and you can select the DNA file to load. Files can be saved with the SAVE SEQUENCE FILE or SAVE SEQUENCE FILE AS... options from the FILE menu. SAVE will replace your old file with the current one in the program. SAVE AS... will allow you to save the current sequence under a different name.
DNA sequences or fragments of DNA sequences can be joined or inverted and then joined using the JOIN/INVERT DNA FRAGMENTS choice from the FILE menu. You can select up to 10 different DNA files to serve as donors of the DNA pieces you want to join. After selecting the files, you enter values for the first and last nucleotides to be used from each DNA and whether you want to invert the sequence before joining it to the previous one. After confirming your choices, the program will generate the new DNA and ask if you want to save it. This aspect of the program is particularly useful for creating new constructs from previously entered DNA sequence files.
The DNA Inspector comes with a database of 365 enzymes, 100 of which are in the default table that appears in the restriction enzyme analysis module. Choosing RESTRICTION ENZYME TABLE from the MODIFY menu allows you to edit the available table or to define new tables of your own. Any table you generate can be defined as the default table and will automatically be used by the restriction enzyme analysis part of the program. The table may be modified by changing the recognition site of a known enzyme, entering a new enzyme, or selecting from the database provided. You can also enter any sequence you want to recognize into the table (e.g. TATAA, or other genetic regulatory element). Edited tables can be saved under their own names or can be made the default table.
The DNA Inspector can read and write TEXT files as well as DNA Inspector files. The SAVE SEQUENCE AS TEXT FILE option from the FILE menu will save just the DNA sequence as a TEXT only (ASCII) file. You can save it as a long contiguous string of nts, or as a set of 50 nt lines. These TEXT files are useful to transfer to word processing programs (which can read TEXT files) and to other computers. It is also possible to read TEXT files created elsewhere and convert them to DNA Inspector files using the CONVERT TEXT DNA SEQUENCE ---> DNA INSPECTOR FILE option from the MODIFY menu. Read the manual to see the finer points of this option. DNA Inspector files (see manual Appendix for detailed format) contain not only the DNA sequence, but information about the first nucleotide of the DNA, linear vs. circular, and associated comments.
You usually will not have to use this option. The DNA Inspector IIe consists of a number of separate modules, each of which must know where the others are located at all times. In order to do this, the program stores the exact pathnames to each of the modules in a location that is accessible to each module. If you move a module or rename it, the program will not be able to identify where it is located. The DEFINE NEW LOCATIONS FOR PROGRAM MODULES option from the MODIFY menu, will allow you to inform the program of any changes you might have made in program location or name.
It is often desirable to work with only part of a longer DNA sequence in the analysis routines. This can be done in either of two ways: 1) edit a sequence to eliminate the parts you do not want to include in your analyses and save it under a different name, or 2) choose WORK WITH A FRAGMENT OF THE CURRENT DNA from the MODIFY menu. This latter option will allow you to define the starting and ending nucleotides you want to include in the analysis. It will also change the name of the DNA to indicate that you are only working with a fragment (e.g. DNA will become DNA [127-1000]).
The SEARCH FOR INVERTED REPEAT and SEARCH FOR DIRECT REPEAT options from the ANALYZE menu allow you to search for repeated sequences in the DNA. You can define the length of the repeat as well as the maximum number of mismatches and the maximum distance between the two parts of the repeat that you will allow. The program will draw a “repeated sequence” map that visually displays the repeats found. This map can often point to regions of secondary structure and potential base-pairing along the length of the sequence. You are also presented with a list of the actually sequences and their locations on the DNA.
SEARCHING DNA WITH SEQUENCE ENTERED AT KEYBOARD from the ANALYZE menu is a very flexible and powerful searching routine. The sequence you type in can have a gap, the maximum length of which you can specify. You can also specify the maximum number of allowable mismatches for each part of the sequence (on each side of the gap). Furthermore, you can identify positions that must match exactly by typing the letters in UPPERCASE. Only nucleotides in lower case can have mismatches. This is often useful for finding consensus sequences, or sequences that might be in two parts.
The SEARCH DNA WITH ANOTHER DNA FRAGMENT option from the ANALYZE menu will do exactly what the menu choice says. You can load a DNA from disk and either define the first and last nucleotides to be used as the search sequence, or you can copy the sequence directly from the donor DNA on the screen.
RESTRICTION ENZYME ANALYSIS from the ANALYZE menu creates either linear or circular restriction maps of the current DNA. It can also predict the gel electrophoresis pattern for restriction enzyme digested DNAs in either single enzyme digests, or multiple enzyme digests with up to four enzymes. For rapid analysis of new DNA sequences, you can choose the “Minimap” analysis. This will produce a two page printout of restriction maps for all 100 enzymes in your default table. Optionally, you can also list the sites for each of the cuts.
BASE COMPOSITION ANALYSIS from the ANALYZE menu provides two options in addition to the simple base content. First, you can view the DNA as a map of 20 equal pieces (circular or linear), with each piece having a color and pattern reflecting the G+C content. This will print in color on the ImageWriter II with the 4 color ribbon. The second option will display a sliding base composition map. The map represents the G+C content of the DNA along the entire DNA length. It is ideal for finding or illustrating regions of A+T or G+C rich DNA.
Selecting HOMOLOGY MATRIX ANALYSIS from the ANALYZE menu will allow you to construct dot matrix homology plots. You can do self-homology or compare two different DNAs. Self-homology allows you to compare the DNA to itself, it’s complement, it’s reverse, or it’s inverse sequence. You can set the analysis length as well as the allowable number of mismatches. Output on the screen is 200 x 200, but can be as great as 2000 x 2000 on the printer (a single dot represents a single nucleotide). Regions of interest can be examined in more detail by “zooming in” on the plot by selecting a region with the mouse.
The OPEN READING FRAME/PEPTIDE ANALYSIS choice from the ANALYZE menu will search for open reading frames in your DNA and will conduct some analyses of the open reading frames it finds. You can specify the minimum length peptide you want to view and can look at all open reading frames or just those starting with ATG. The peptide analyses include: molecular weight, isoelectric point, three letter and one letter amino acid sequence, Hopp-Woods Hydrophilicity plot, amino acid composition, and codon usage tables.
The M13 SHOTGUN SEQUENCE ALIGNMENT selection on the ANALYZE menu will take up to 10 sequences (totalling up to 100,000 nts) and attempt to align them and construct a contiguous sequence from the pieces. You can edit and save the aligned sequences as well as the newly generated “contig” sequence.
AUTOMATED DNA SEQUENCE ANALYSIS from the ANALYZE menu will conduct a series of analyses for you without requiring any input during the run. This is an ideal way to do your favorite set of analyses on any new DNA sequences of interest. All the analysis parameters are defined before the run and all the results will be printed in your absence. Analysis options include: base composition, self homology matrix, restriction enzyme patterns, open reading frame, direct and inverted repeats, and printing the DNA sequence and comments.
SEARCH RESTRICTION ENZYME DATA BASE from the ANALYZE menu allows you to search the data base in two different ways. First, you can select any enzyme and find all isoschizomers for that enzymes. Second, you can enter any DNA sequence up to 75 nts and find out which enzymes in the data base recognize a sequence within that search sequence. This is useful for very rapidly searching a region of interest for useful restriction sites.
This option from the MODIFY menu lets you define a fragment of the DNA that is currently loaded in the program to be used for any analyses you choose to do. You will be asked to define the first and last nucleotides to be included in the new DNA piece. The name of the DNA (as it appears in all program output) will be changed to indicate that it is only a fragment of the DNA file on the disk. The name will indicate the nucleotides included. For example, if the DNA name is MyDNA, a fragment would have a name something like MyDNA [207 --> 554].
This option allows you to define a specific DNA or fragment of DNA saved on disk to be used as the search sequence. You can define the search sequence by: 1) specifying the first and last nucleotides, or 2) cutting the search sequence from a display of the DNA sequence from the disk. Once the search sequence is defined, you can define the maximum number of allowable mismatches.
This is a very powerful routine that has several built in features allowing you to precisely define search criteria. Any search sequence in uppercase must match exactly. Search sequences in lowercase can have mismatches up to the extent you define. The search sequence may have a mixture of upper and lowercase nts. For example, the sequence ‘acGTGaatCC’ will find matches with GTG in positions 3-5 and CC in positions 9-10. Any mismatches would be in ‘ac’ or ‘aat’. You can also specify sequences in two parts, separated by a hyphen (e.g. ‘atCGC-GaTTC’). In this case you may specify the maximum separation of the parts and the extent of allowable mismatching in each part of the search sequence.
The M13 analysis routine is designed to align sequences that were originally part of a contiguous sequence in the DNA. It will find any match of 30 nts or longer that have no more than the specified number of mismatches. During the alignment of the sequences, progress is indicated on the screen. The program attempts to make some intelligent decisions about which sequences could have been part of an original contig and will not show alignments of sequences that could not have come from the same original sequence. If you ahve doubts about what is going on, check the manual for more details, and try examining the sequences by the homology matrix analysis. Gaps and other anomalies may prevent the M13 routine from aligneing your sequences, but any similarity should show up in the homology analysis.
Viewing and editing the contig sequence(s) allows you to see the contig sequence along the top of the screen with each of the contributing sequences aligned below it. As you go through the contig, the program points out any positions of disagreement in the aligned sequences. You can accept the default contig nucleotide (‘N’), or change it to whatever you think is appropriate. The program will take you through each contig sequence generated and point out each disagreement. When you are finished, you can save the contig sequence.
After the M13 analysis, you can generate output in a number of ways. Anything you view on the screen can also be printed or saved to disk as a TEXT file to be read by any text editing program. You can view the aligned sequences, a table of the matches found by the program, and a summary of all the contig assignments. You can also save the contig sequence(s) as DNA Inspector files. The program will not only save the DNA sequences but will generate comments explaining how the sequence was derived. These comments can be viewed from the main screen since they are stored the same as comments you can enter for any DNA file created.
The amino acid sequence of an open reading frame can be displayed in either three letter code, one letter code, or both codes. The three letter display puts 20 amino acids on a line and the one letter display puts 70 amino acids per line.
Peptide molecular weight and charge at neutral pH can be displayed for each ORF found. Molecular weight is calculated by adding the molecular weights for each amino acid in the chain and adjusting for internal bonding and end effects. The charge at neutral pH is calculated by summing all the charged amino acids in the peptide at neutral pH.
The codon usage table presents a list of all 64 possible codons and the number of times that codon is used in the given peptide. This is useful for comparison to known codon bias tables to see if the peptide from the ORF has similar codon usage to that known for other genes from that organism.
Amino acid composition can be determined for each polypeptide found. The table displays the actual number of times each amino acid is present and the percentage of the total number of amino acids that each individual amino acid represents.
The Hopp-Woods hydrophilicity plot is used to determine which regions of the peptide are hydrophobic and which are hydrophyllic based on the sequence of amino acids present in localized regions of the peptide. You can set the “size” of this localized window by defining the averaging length to be used in the calculations. The authors recommend a value of 6, which the DNA Inspector uses as a default value. Larger values smooth out noise, but may miss some signals. Smaller values increase noise levels, but may pick up signals impossible to see with larger values. Experiment if you are curious. The reference to the original paper is in the manual.
When you choose to do a self-homology analysis, you can compare the DNA sequence to itself, it’s reverse, it’s inverse, or it’s complementary sequence. The definitions of for these terms (along with examples) are shown on screen when this option is chosen. Comparing a sequence to itself can point out regions of internal repeated structures.
Choosing good searching parameters for the homology comparison is critical. You can dramatically increase or decrease the noise level as well as your signal. It is best to try several different values to get a feel for how your sequences compare. Choosing a search length of 10 with 2 allowable mismatches [10/2], will often give satisfactory results. Other values found to be useful are 6/0, 15/3, and 20/4. These will work for some sequences and not for others. The best approach is to experiment with the parameters until you optimize the signal to noise ratio and get meaningful data.
The on-screen display can show a matrix of 200 x 200 dots. This represents breaking the DNA into 200 segments to be represented along each side of the matrix (a resolution of 0.5%). Larger matrices can be constructed to give you higher resolution but these can only be seen on printouts. The higher resolutions are: 500 x 500 (0.1%), 1000 x 1000 (0.01%), and 2000 x 2000 (0.005%). The higher resolutions are quite accurate, especially on the LaserWriter, but they can take some time to print. It does not improve you analysis to use a higher number of segments than the length of your DNA (i.e. - if you DNA is 500 nts long, a matrix of 1000 x 1000 would give you no more information than the 500 x 500 matrix).
When the 200 x 200 matrix is completed on screen, you can press the “zoom-in” button on the right to enable the mouse to be used to select a region of the matrix for more detailed analysis. If you have long sequences that you are comparing, the initial matrix may take some time to complete, even though the DNA Inspector is one of the fastest programs for this analysis. Once you have defined a region with the mouse, that region will be recalculated (not just expanded on screen, like fat bits in MacPaint) and redisplayed at the new detail. A “zoom-out” button will return you to the original plot.
There are three options available for base composition analysis. The first simply gives the base composition for each base - both a count and a percentage. The second option gives a partial length analysis by fragmenting the DNA into 20 pieces and displaying the results of each fragment as a list of values and as a graphical display (in color on Mac II or ImageWriter II with 4-color ribbon). This is useful for displaying regions of GC or AT rich sequences. The last option is a sliding base composition. This option allows you to define a “window” of any number of bases, to be moved along the DNA. As the window moves, the base composition in the window is calculated and displayed on a histogram. You set the window size and the number of bases to be moved each time the window moves using a simple diagram in the program.
With this option, you can select an enzyme from a scrollable list of all the enzymes in the database. The program will present you with a list of all other enzymes that will RECOGNIZE the same sequene. Note that they might not all CUT at the same site.
You can type in any sequence of up to about 60 characters and have the DNA Inspector search the restriction enzyme database for all enzymes that can recognize a site in the sequence you entered. This is a useful and very rapid routine for finding all the sites that exist in a short piece of “linker” DNA.
You can edit any entry in the restriction enzyme table simply by selecting it from the table. Other tables can be edited by loading them first. You can change the recognition sequence, the enzyme name, the cut site, and whether the to search both strands of the DNA. You can enter any sequence you like and are not limited to restriction enzyme recognition sequences. For example, you may enter a TATAAA sequence, or any other sequence you search for often. You may also choose any enzyme from the database to be the new entry in the table.
To enter a new sequence into the table, you must follow a few simple rules. All letters in the sequence must be in uppercase and are limited to A, C, G, T, Y, R, N. Any other letters will be rejected by the program. The vertical bar “|” is used to signify the cut site (it can be left out if the cut site is unknown). To signify that multiple sites are possible, use parentheses and the slash: A(C/G)TA(G/C)T means to find ACTAGT, ACTACT, AGTAGT, or AGTACT. If the recognition sequence is not symmetrical (palindromic), you should specify that the second DNA strand is to be searched.
Any table that you can create can be saved and then loaded. It can also be modified at a later time. To edit any table, choose LOAD NEW ENZYME TABLE from the FILE menu. After editing the table can be saved under the same or different name. Before you start editing, it is a good idea to make a backup copy of the current table. If you make a mistake, you can then restore the table to it’s previous state.
When the DNA Inspector conducts a restriction enzyme analysis, it uses a the restriction enzyme table that has been defined as the “default” enzyme table. You can specify that table you create as the default table. Thus, you may have several tables constructed and change the default table before an automated sequence analysis to enable you to automatically analyze the DNA using different sets of enzymes. The current default restriction enzyme table is saved in the “tdi2e.stuff” folder under the name “RE.Data”. Before you redefine any table as the default table, it is a good idea to make a backup of the original default file using the Finder.
Reconfiguring the keyboard provides you with the opportunity to choose the keys you would like to represent the allowable characters. Thus, instead of using ‘a’, ‘c’, ‘g’, and ‘t’ to represent A, C, G, and T, you may redefine the keyboard to have ‘1’, ‘2’, ‘3’, and ‘4’ represent A, C, G, and T. Redefining the keyboard in this way will facilitate sequence entering. The redefined keyboard will remain in effect until you exit the sequence entering module (“line editing a sequence”).
The cut & paste editing option allows you to move large numbers of nucleotides by cutting and pasting. You can display the sequence being edited as well as a “donor” sequence from which segments can be cut and pasted into the sequence you are currently editing. You can also search for a sequence entered at the keyboard. You can freely move between the cut & paste editing mode and the line editing mode. You can also move to the sequence confirming mode.
The line editing mode allows you to edit sequences one line at a time. Since you can redefine the keyboard, this mode is ideal for entering new sequences into the DNA Inspector. The entire sequence is displayed in a scrollable window which can be used to choose the sequence line to be edited. You can freely move between the line editing mode and the cut & paste editing mode.
Sequences can be confirmed either by speaking, or by typing. When you choose to confirm the sequence by speaking, the Mac will read the sequence to you, while you follow along on a printout. After noting any changes to be made, you can switch to the line editing mode to make the changes. Having the Macintosh read the sequence to you is a very rapid way to confirm a sequence. When confirming a sequence by typing, you are asked to type the sequence in a second time. As you enter each nucleotide, your entry is compared to the previous entry for that position. Any discrepancies are pointed out and you can make changes.
Entering a sequence is best done in the line editor. You can redefine the keyboard to allow you to choose a layout that is convenient for you. For example, you may decide to define the keys 1-4 to represent the nucleotides A, C, G, and T. In this way, the keys you use the most will be adjacent and will facilitate sequence entry. You can also have the Macintosh speak each nucleotide as it is entered to provide you with aural feedback for each entry. You can move from the line editor to the cut & paste editor freely.
In the line editing mode, you may choose to delete nucleotides from any individual line. The DNA Inspector will not automatically adjust all the other lines to fill in the new spaces (in the line editor mode) to preserve the original sequence numbering for you to use in subsequent editing. If you want to adjust the DNA sequence to fill in any gaps that have resulted from the deletion of nucleotides, you can choose the compact sequence option.
In the line editor, there are several special keys to help you edit. The left and right arrows will move the cursor (editing position) to the left and to the right, respectively. The space bar will move the cursor to the right and delete the character to the right at the same time. The delete key will backspace and delete the character to the left of the cursor position. The ‘>’ key acts like the right arrow key and the ‘<’ key acts like the left arrow key.
In the cut & paste editor, 1000 nucleotides are displayed at a time. If the sequence you are editing is longer than 1000 nucleotides, you can move to each 1000 nucleotide segment with the ‘NEXT’ and ‘PREV’ buttons. These buttons are only selectable when appropriate and will show either the next 1000 nucleotides or the previous 1000 nucleotides.
A useful feature of the cut & paste editor is the ability to FIND a sequence within the sequence being edited. Pressing the FIND button, or selecting FIND from the menu, will allow you to enter a sequence to search for. After entering the search sequence, the DNA Inspector will search for the first match in the sequence being edited and will display it highlighted on the screen. You can either stop at that position or continue the search for further matches with the search sequence.
In the cut & paste editor, you can load other DNA sequences in addition to the sequence being edited. These ‘DONOR’ sequences can be used as a source of DNA fragments to copy and paste into the sequence being edited. Pressing the LOAD DONOR sequence button will allow you to select a sequence file to serve as the DONOR sequence. Once the sequence is loaded, the button changes to a SHOW DONOR button, that can be used to display the donor sequence.
In the cut & paste editor, you can choose to replace any character with another one. For example, if you want to highlight the positions of all A’s in your sequence, you might choose to replace all A’s with ‘*’. Although the new sequence cannot be used in any of the analysis routines, it can be printed and will distinctly highlight aspects of the sequence you might be interested in.
When you are finished with the cut & paste editing, you should select the “FINISHED EDITING” button. This will finalize all the changes you have made and give you the option of checking the sequence for any non-nucleotide characters.
When you are working in the cut & paste editor and are working on a donor sequence, the SHOW EDIT SEQUENCE button becomes active. This button will display (and let you work on) the edit sequence. It will hide (but not discard) the donor sequence. Once pressed, the SHOW EDIT SEQUENCE button changes into a SHOW DONOR SEQUENCE button.
In the cut & paste editor, you may clear a donor sequence from memory with this button. After the current donor sequence is cleared from memory, you may load another donor sequence for use in editing. If you will be needing many donor sequences, it is probably better to use the RECOMBINE DNA FRAGMENTS option from the main screen.
The GO TO button in the cut & paste editor allows you to enter the position in the nucleotide sequence that you want to be displayed in the editing window. After you enter the position to be displayed, the DNA Inspector will find that position and will display the appropriate segment of 1000 nucleotides containing that location in the editing window.
When recombining DNA fragments, the invert option will signify that the sequence fragment chosen should be inverted prior to placing into the new construct.
The RECOMBINE DNA option will allow you to specify up to 10 different DNA fragments to be joined together into a contiguous sequence. You enter the segments in order from the 5’ end to the 3’ end by selecting files from a disk and then defining the actual nucleotides positions in the chosen sequences to be joined. You have the option of inverting a sequence before placing it into the new construct. The new construct can be saved as a DNA Inspector file.
When entering or editing a sequence in the line editor mode, you can have the DNA Inspector automatically save the sequence after a certain specifiec interval by using the “auto-save” option. Choosing this option will allow cause the DNA Inspector to save the sequence as it exists every few minutes (you are asked to specify the interval when you choose the auto-save option). This option may be useful to you if you do not regularly save your work, or you work in an environment that has many distractions or interruptions - the DNA Inspector will not forget to save the sequence when you are distracted.